Skip to content

Parallelize the exhaustive float32 sweeps across hardware threads (~75-88x)#383

Merged
lemire merged 2 commits into
fastfloat:mainfrom
redis-performance:pr/parallel-exhaustive
Jun 1, 2026
Merged

Parallelize the exhaustive float32 sweeps across hardware threads (~75-88x)#383
lemire merged 2 commits into
fastfloat:mainfrom
redis-performance:pr/parallel-exhaustive

Conversation

@fcostaoliveira
Copy link
Copy Markdown
Contributor

exhaustive32_midpoint sweeps all 2³² float bit-patterns one at a time, which takes
~30 min on a fast machine. The values are independent, so this splits the range across
std::thread::hardware_concurrency() threads, with an atomic flag for fail-fast. The
per-value work is factored into a check_word() helper; the checks and pass/fail
behavior are unchanged.

On a 96-core machine the runtime drops from ~1900 s to ~25 s (≈75×), identical result:

gcc 13  : PASS in 26s
clang 18: PASS in 22s

Notes:

  • The exhaustive tests are gated behind FASTFLOAT_EXHAUSTIVE (off by default and not
    built in CI), so this has no CI impact — it's purely a faster local/dev sweep.
  • tests/CMakeLists.txt now links Threads::Threads (with
    THREADS_PREFER_PTHREAD_FLAG).
  • C++11, compiles clean under the existing -Werror -Wall -Wextra -Weffc++ -Wconversion -Wsign-conversion -Wshadow set on gcc and clang; clang-format clean.

The same pattern applies to the sibling sweeps (exhaustive32, exhaustive32_64); I
kept this PR to one test for review, happy to extend it if you'd like.

Copy link
Copy Markdown
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

Same std::thread split as exhaustive32_midpoint; preserves each test's existing
failure behavior (abort for exhaustive32, stop-flag for exhaustive32_64).
@fcostaoliveira fcostaoliveira changed the title Parallelize the exhaustive midpoint test across hardware threads (~75x faster) Parallelize the exhaustive float32 sweeps across hardware threads (~75-88x) Jun 1, 2026
@fcostaoliveira
Copy link
Copy Markdown
Contributor Author

Extended to the sibling sweeps as offered — exhaustive32 and exhaustive32_64 now use the same std::thread split, preserving each test's existing failure behavior (abort for exhaustive32, stop-flag for exhaustive32_64).

96-core timings (gcc), all pass and clean under -Werror on gcc + clang:

test before after
exhaustive32 ~1055s 12s
exhaustive32_64 ~1555s 18s
exhaustive32_midpoint ~1839s 22s

Happy to drop the two extra files back out if you'd prefer to keep this PR to just the midpoint test.

@lemire
Copy link
Copy Markdown
Member

lemire commented Jun 1, 2026

It is fine. Merging.

@lemire lemire merged commit 06f3e27 into fastfloat:main Jun 1, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants